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folio optimization, in model calibration for options pricing as well as in ascertaining a 
-^ pricing measure in incomplete markets. The abstracted problem corresponds to finding 

a probability measure that minimizes the relative entropy (also called /-divergence) with 
respect to a known measure while it satisfies certain moment constraints on functions of 
underlying assets. In this paper, we show that under /-divergence, the optimal solution 
may not exist when the underlying assets have fat tailed distributions, ubiquitous in fi- 
^ nancial practice. We note that this drawback may be corrected if 'polynomial-divergence' 

is used. This divergence can be seen to be equivalent to the well known (relative) Tsallis 
or (relative) Renyi entropy. We discuss existence and uniqueness issues related to this 
new optimization problem as well as the nature of the optimal solution under different 
objectives. We also identify the optimal solution structure under /-divergence as well as 
polynomial-divergence when the associated constraints include those on marginal distri- 
bution of functions of underlying assets. These results are applied to a simple problem of 
model calibration to options prices as well as to portfolio modeling in Markowitz frame- 
work, where we note that a reasonable view that a particular portfolio of assets has heavy 
tailed losses may lead to fatter and more reasonable tail distributions of all assets. 



Abstract 

In the existing financial literature, entropy based ideas have been proposed in port- 



1 Introduction 

Entropy based ideas have found a number of popular applications in finance over the last two 
decades. A key application involves portfolio optimization where these are used (see, e.g., 
Meucci [33]) to arrive at a 'posterior' probability measure that is closest to the specified 'prior' 
probability measure and satisfies expert views modeled as constraints on certain moments 
associated with the posterior probability measure. Another important application involves 
calibrating the risk neutral probability measure used for pricing options (see, e.g., Buchen and 
Kelly [5], Stutzer [H] . Avellaneda et al. [3]). Here entropy based ideas are used to arrive at 
a probability measure that correctly prices given liquid options while again being closest to a 
specified 'prior' probability measure. 

In these works, relative entropy or /-divergence is used as a measure of distance between 
two probability measures. One advantage of /-divergence is that under mild conditions, the 
posterior probability measure exists and has an elegant representation when the underlying 
model corresponds to a distribution of light tailed random variables (that is, when the moment 
generating function exists in a neighborhood around zero). However, we note that this is no 
longer true when the underlying random variables may be fat-tailed, as is often the case in 
finance and insurance settings. One of our key contributions is to observe that when probability 
distance measures corresponding to 'polynomial-divergence' (defined later) are used in place 
of /-divergence, under technical conditions, the posterior probability measure exists and has 
an elegant representation even when the underlying random variables may be fat-tailed. Thus, 
this provides a reasonable way to incorporate restrictions in the presence of fat tails. Our 
another main contribution is that we devise a methodology to arrive at a posterior probability 
measure when the constraints on this measure are of a general nature that include specification 
of marginal distributions of functions of underlying random variables. For instance, in portfolio 
optimization settings, an expert may have a view that certain index of stocks has a fat-tailed 
t-distribution and is looking for a posterior distribution that satisfies this requirement while 
being closest to a prior model that may, for instance, be based on historical data. 

1.1 Literature Overview 

The evolving literature on updating models for portfolio optimization builds upon the pioneer- 
ing work of Black and Litterman [3] (BL). BL consider variants of Markowitz's model where 
the subjective views of portfolio managers are used as constraints to update models of the mar- 
ket using ideas from Bayesian analysis. Their work focuses on Gaussian framework with views 
restricted to linear combinations of expectations of returns from different securities. Since then 
a number of variations and improvements have been suggested (see, e.g., [31], [35] and [37]). 
Recently, Meucci [33] proposed 'entropy pooling' where the original model can involve general 
distributions and views can be on any characteristic of the probability model. Specifically, he 
focuses on approximating the original distribution of asset returns by a discrete one generated 
via Monte-Carlo sampling (or from data). Then a convex programming problem is solved that 
adjusts weights to these samples so that they minimize the /-divergence from the original sam- 
pled distribution while satisfying the view constraints. These samples with updated weights are 
then used for portfolio optimization. Earlier, Avellaneda et al. [3] used similar weighted Monte 
Carlo methodology to calibrate asset pricing models to market data (see also Glasserman and 
Yu [21]). Buchen and Kelly in [5] and Stutzer in [U] use the entropy approach to calibrate 
one-period asset pricing models by selecting a pricing measure that correctly prices a set of 
benchmark instruments while minimizing /-divergence from a prior specified model, that may, 



for instance be estimated from historical data (see also the recent survey article [ID]). 

Zhou et.al in [13] consider a statistical learning framework for estimating default probabil- 
ities of private firms over a fixed horizon of time, given market data on various 'explanatory 
variables' like stock prices, financial ratios and other economic indicators. They use the entropy 
approach to calibrate the default probabilities (see also Friedman and Sandow [IS]). 

Another popular application of entropy based ideas to finance is in valuation of non-hedgeble 
payoffs in incomplete markets. In an incomplete market there may exist many probability 
measures equivalent to the physical probability measure under which discounted price processes 
of the underlying assets are martingales. Fritelli ([19"]) proposes using a probability measure 
that minimizes the /-divergence from the physical probability measure for pricing purposes (he 
calls this the minimal entropy martingale measure or MEMM). Here, the underlying financial 
model corresponds to a continuous time stochastic process. Cont and Tankov (|6J) consider, 
in addition, the problem of incorporating calibration constraints into MEMM for exponential 
Levy processes (see also Kallsen [28]). Jeanblanc et al. ([26]) propose using polynomial- 
divergence (they call it /^-divergence distance) instead of /-divergence and obtain necessary 
and sufficient conditions for existence of an EMM with minimal polynomial-divergence from 
the physical measure for exponential Levy processes. They also characterize the form of density 
of the optimal measure. Goll and Ruschendorf ([22]) consider general /-divergence in a general 
semi-martingale setting to single out an EMM which also satisfies calibration constraints. 

A brief historical perspective on related entropy based literature may be in order: This 
concept was first introduced by Gibbs in the framework of classical theory of thermodynamics 
(see [20]) where entropy was defined as a measure of disorder in thermodynamical systems. 
Later, Shannon [39] (see also |30j) proposed that entropy could be interpreted as a measure of 
missing information of a random system. Jaynes [24], [25] further developed this idea in the 
framework of statistical inferences and gave the mathematical formulation of principle of max- 
imum entropy (PME), which states that given partial information/views or constraints about 
an unknown probability distribution, among all the distributions that satisfy these restrictions, 
the distribution that maximizes the entropy is the one that is least prejudiced, in the sense of 
being minimally committal to missing information. When a prior probability distribution, say, 
/i is given, one can extend above principle to principle of minimum cross entropy (PMXE), 
which states that among all probability measures which satisfy a given set of constraints, the 
one with minimum relative entropy (or the /-divergence) with respect to //, is the one that is 
maximally committal to the prior \x. See [2S] for numerous applications of PME and PMXE 
in diverse fields of science and engineering. See [12], p], [H], [23] and [27] for axiomatic justi- 
fications for Renyi and Tsallis entropy. For information theoretic import of /-divergence and 
its relation to utility maximization see Friedman et.al. jT7J, Slomczynski and Zastawniak |40j . 

1.2 Our Contributions 

In this article, we restrict attention to examples related to portfolio optimization and model 
calibration and build upon ideas proposed by Avellaneda et al. [3J, Buchen and Kelly [5], 
Meucci [33J and others. We first note the well known result that for views expressed as finite 
number of moment constraints, the optimal solution to the /-divergence minimization can be 
characterized as a probability measure obtained by suitably exponentially twisting the original 
measure. This measure is known in literature as the Gibbs measure and our analysis is based 
on the well known ideas involved in Gibbs conditioning principle (see, for instance, [13]). As 
mentioned earlier, such a characterization may fail when the underlying distributions are fat- 
tailed in the sense that their moment generating function does not exist in some neighborhood 



of origin. We show that one reasonable way to get a good change of measure that incorporates 
viewqj in this setting is by replacing /-divergence by a suitable 'polynomial-divergence' as an 
objective in our optimization problem. We characterize the optimal solution in this setting, 
and prove its uniqueness under technical conditions. Our definition of polynomial- divergence 
is a special case of a more general concept of /- divergence introduced by Csiszar in [8], [9] and 
[TO] . Importantly, polynomial-divergence is monotonically increasing function of the well known 
relative Tsallis Entropy [42J and relative Renyi Entropy [3B] , and moreover, under appropriate 
limit converges to /-divergence. 

As indicated earlier, we also consider the case where the expert views may specify marginal 
probability distribution of functions of random variables involved. We show that such views, 
in addition to views on moments of functions of underlying random variables are easily incor- 
porated. In particular, under technical conditions, we characterize the optimal solution with 
these general constraints, when the objective may be /-divergence or polynomial-divergence 
and show the uniqueness of the resulting optimal probability measure in each case. 

As an illustration, we apply these results to portfolio modeling in Markowitz framework 
where the returns from a finite number of assets have a multivariate Gaussian distribution and 
expert view is that a certain portfolio of returns is fat-tailed. We show that in the result- 
ing probability measure, under mild conditions, all assets are similarly fat-tailed. Thus, this 
becomes a reasonable way to incorporate realistic tail behavior in a portfolio of assets. Gen- 
erally speaking, the proposed approach may be useful in better risk management by building 
conservative tail views in mathematical models. 

Note that a key reason to propose polynomial- divergence is that it provides a tractable and 
elegant way to arrive at a reasonable updated distribution close to the given prior distribution 
while incorporating constraints and views even when fat-tailed distributions are involved. It is 
natural to try to understand the influence of the choice of objective function on the resultant 
optimal probability measure. We address this issue for a simple example where we compare 
the three reasonable objectives: the total variational distance, /-divergence and polynomial- 
divergence. We discuss the differences in the resulting solutions. To shed further light on 
this, we also observe that when views are expressed as constraints on probability values of 
disjoint sets, the optimal solution is the same in all three cases. Furthermore, it has a simple 
representation. We also conduct numerical experiments on practical examples to validate the 
proposed methodology. 

1.3 Organization 

In Section 2, we outline the mathematical framework and characterize the optimal probabil- 
ity measure that minimizes the /-divergence with respect to the original probability measure 
subject to views expressed as moment constraints of specified functions. In Section 3, we show 
through an example that /-divergence may be inappropriate objective function in the presence 
of fat-tailed distributions. We then define polynomial-divergence and characterize the optimal 
probability measure that minimizes this divergence subject to constraints on moments. The 
uniqueness of the optimal measure, when it exists, is proved under technical assumptions. We 
also discuss existence of the solution in a simple setting. When under the proposed methodol- 
ogy a solution does not exist, we also propose a weighted least squares based modification that 
finds a reasonable perturbed solution to the problem. In Section 4, we extend the methodology 
to incorporate views on marginal distributions of some random variables, along with views on 
moments of functions of underlying random variables. We characterize the optimal probability 

1 In this article, we often use 'views' or 'constraints' interchangeably 



measures that minimize /-divergence and polynomial-divergence in this setting and prove the 
uniqueness of the optimal measure when it exists. In Section 5, we apply our results to the 
portfolio problem in the Markowitz framework and develop explicit expressions for the poste- 
rior probability measure. We also show how a view that a portfolio of assets has a 'regularly 
varying' fat-tailed distribution renders a similar fat-tailed marginal distribution to all assets 
correlated to this portfolio. Section 6 is devoted to comparing qualitative differences on a 
simple example in the resulting optimal probability measures when the objective function is 
/-divergence, polynomial-divergence and total variational distance. In this section, we also 
note that when views are on probabilities of disjoint sets, all three objectives give identical 
results. We numerically test our proposed algorithms on practical examples in Section 7. Fi- 
nally, we end in Section 8 with a brief conclusion. All but the simplest proofs are relegated to 
the Appendix. 

We thank the anonymous referee for bringing Friedman et al [18] to our notice. This article 
also considers /-divergence and in particular, polynomial-divergence (they refer to an equivalent 
quantity as w-entropy) in the setting of univariate power-law distribution which includes Pareto 
and Skewed generalized ^-distribution. It motivates the use of polynomial-divergence through 
utility maximization considerations and develops practical and robust calibration technique for 
univariate asset return densities. 

2 Incorporating Views using /-Divergence 

Some notation and basic concepts are needed to support our analysis. Let (Q, J 7 , fi) denote 
the underlying probability space. Let V be the set of all probability measures on (O, J 7 ). For 
any v G V the relative entropy of v w.r.t /i or I-divergence of v w.r.t fi (equivalently, the 
Kullback-Leibler distanced is defined as 



DHM:=/logg)* 



if v is absolutely continuous with respect to /z and log(-^) is integrable. D{y || \x) = +oo 
otherwise. See, for instance [7J, for concepts related to relative entropy. 

Let V(fi) be the set of all probability measures which are absolutely continuous w.r.t. /i, 
if) : Q — > K be a measurable function such that J \i[i\e^dfj, < oo, and let 



A(V>) :=log e* d\x G (-oo, +00] 
denote the logarithmic moment generating function of ip w.r.t \x. Then it is well known that 

A(V>) = sup { / i\)dv-D{v || //)}. 

Furthermore, this supremum is attained at v* given by: 

dv* e^ 

d/j, ~ J e^ dfi' 

(see, for instance, [7J, [2H], [TB]). 

2 though this is not a distance in the strict sense of a metric 



In our optimization problem we look for a probability measure v G V(fi) that minimizes 
the /-divergence w.r.t. fi. We restrict our search to probability measures that satisfy moment 
constraints J gidv > Ci, and/or J g%dv = Ci, where each g^ is a measurable function. For 
instance, views on probability of certain sets can be modeled by setting g^s as indicator func- 
tions of those sets. If our underlying space supports random variables (X 1; . . . , X n ) under the 
probability measure /i, one may set gi = fi(Xi, . . . ,X n ) so that the associated constraint is on 
the expectation of these functions. 

Formally, our optimization problem Oi is: 



min / log ( — 1 dv (2) 



v£P{v) J \dnj 

!**>>«, (3) 

gidv = d, (4) 



subject to the constraints: 
for i = 1 , . . . , k\ and 



for % = ki + 1, . . . , k. Here k\ can take any value between and k. 
The solution to this is characterized by the following assumption: 

Assumption 1 There exist Aj > for i — 1, . . . , k\, and A/- 1+ i, ..., A& G R such that 

and the probability measure u° given by 

f e^ iXi9i dp, 



u\A) = / r „ . 7 (5) 

for all A G T satisfies the constraints |5|) and (Q). Furthermore, the complementary slackness 
conditions 



Aj(cj - / g t du) = 0, 

hold for i = 1, . . . , k\. 

The following theorem follows: 

Theorem 1 Under Assumption [7J u° is an optimal solution to Oi. 

This theorem is well known and a proof using Lagrange multiplier method can be found in 
[Z] , [IS] , [5] and [2] . For completeness we sketch the proof below. 



Proof of Theorem 



1 Oi is equivalent to maximizing —D(y || //) = — f log(^) du subject 



to the constraints (J3J) and Q. The Lagrangian for the above maximization problem is: 

£ = y^ Aj ( / g t dv - Ci ) + {-D(v \ \ fi)) 
= ipdu- D(v 1 1 jtf) - y~] XjCj, 



where ij) = £\ \9%- Then by (pi) and the preceding discussion, it follows that u° maximizes C 
By Lagrangian duality, due to Assumption [IJ u G also solves Oi. D 

Note that to obtain the optimal distribution by formula (J5J), we must solve the constraint 
equations for the Lagrange multipliers Ai, A2, • • • , Xk- The constraint equations with its explicit 
dependence on Aj's can be written as: 

f gj e^* dfl _ 

This is a set of k nonlinear equations in k unknowns Ai, A 2 , . . . , Xk, and typically would require 
numerical procedures for solution. It is easy to see that if the constraint equations are not 
consistent then no solution exists. For a sufficient condition for existence of a solution, see 
Theorem 3.3 of [IT] . On the other hand, when a solution does exist, it is helpful for applying 
numerical procedures, to know if it is unique. It can be shown that the Jacobian matrix of the 
set of equation ^ is given by the variance-covariance matrix of gi, g 2 , ■ ■ ■ , gu under the measure 
given by (J5J) . The last mentioned variance-covariance matrix is also equal to the Hessian of the 
following function 

G(Ai, A 2 , . . . , X k ) := / e^ x ^ d^-J2 V*- 

J i 

For details, see [5] or [2]. It is easily checked that ([6]) is same as 

dG dG dG 



dX\ ' dX 2 ' ' dXh 



0. (7) 



It follows that if no non-zero linear combination of gi,g 2 , ■ ■ ■ ,9k nas zero variance under the 
measure given by (|5]), then G is strictly convex and if a solution to ([6]) exists, it is unique. It 
also follows that instead of employing a root-search procedure to solve (J6]) for Aj's, one may 
as well find a local minima (which is also global) of the function G numerically. We end this 
section with a simple example. 

Example 1 Suppose that under //, random variables X = (X\, . . . ,X n ) have a multivariate 
normal distribution N(a, S), that is, with mean a G W 1 and variance covariance matrix S G 
K nx ™. If constraints correspond to their mean vector being equal to a, then this can be achieved 
by a new probability measure u° obtained by exponentially twisting /i by a vector A G lR n such 
that 

A = (S- 1 ) T (a-a). 

Then, under z/°, X is N(a, E) distributed. 

3 Incorporating Views using Polynomial- Divergence 

In this section, we first note through a simple example involving a fat-tailed distribution that 
optimal solution under /-divergence may not exist in certain settings. In fact, in this simple 
setting, one can obtain a solution that is arbitrarily close to the original distribution in the 
sense of /-divergence. However, the form of such solutions may be inelegant and not reasonable 
as a model in financial settings. This motivates the use of other notions of distance between 
probability measures as objectives such as /-divergence (introduced by Csiszar [10]). We first 
define general /-divergence and later concentrate on the case where / has the form f(u) = 
m' 3 " 1 " 1 , /3 > 0. We refer to this as polynomial-divergence and note its relation with relative 



Tsallis entropy, relative Renyi entropy and /-divergence. We then characterize the optimal 
solution under polynomial-divergence. To provide greater insight into nature of this solution, 
we explicitly solve a few examples in a simple setting involving a single random variable and 
a single moment constraint. We also note that in some settings under polynomial-divergence 
as well an optimal solution may not exist. With a view to arriving at a pragmatic remedy, we 
then describe a weighted least squares based methodology to arrive at a solution by perturbing 
certain parameters by a minimum amount. 

3.1 Polynomial Divergence 

Example 2 Suppose that under /i, non-negative random variable X has a Pareto distribution 
with probability density function 

„. . a — 1 
f( x ) = t, r- , x> 0, a > 2. 

y ' (1 + x) a 

The mean under this pdf equals l/(a — 2). Suppose the view is that the mean should equal 
c > l/(a — 2). It is well known and easily checked that 

J xe Xx f(x)dx 
J e Xx f(x)dx 

is an increasing function of A that equals oo for A > 0. Hence, Assumption 1, does not hold 
for this example. Similar difficulty arises with other fat-tailed distributions such as log-normal 
and ^-distribution. 

To shed further light on Example [2| for M > 0, consider a probability distribution 

exp(Xx)f(x)I [0jM] (x) 



fx( x ) 



-M 



L exp(Xx)f(x)dx 



Where Ia{-) denote the indicator function of the set A, that is, Ia(%) = 1 if x 6 i and 
otherwise. Let Am denote the solution to 

/■oo 

xf\(x)dx = c (> l/(a — 2)). 



o 



Proposition 1 The sequence {Am} ond the the I -divergence J log ( A 5 -i ) fx M (x)dx both con- 
verge to zero as M — > oo. 

Solutions such as f\ M above are typically not representative of many applications, motivat- 
ing the need to have alternate methods to arrive at reasonable posterior measures that are close 
to /i and satisfy constraints such as ^ and Q while not requiring that the optimal solution 
be obtained using exponential twisting. 

We now address this issue using polynomial-divergence. We first define /-divergence intro- 
duced by Csiszar (see [S], [S] and [TU]). 

Definition 1 Let f : (0, oo) — y R be a strictly convex function. The f -divergence of a proba- 
bility measure v w.r.t. another probability measure \x equals 



l A v II n) '■= f ( — ) c/ /' 



\dfij 



if v is absolutely continuous and /(^p) is integrable w.r.t. //. Otherwise we set If(u \\ fi) = oo. 

8 



Note that /-divergence corresponds to the case f(u) = ulogu. Other popular examples of 
/ include 

f{u) = -logtt, f(u) = u p+ \ > 0, f(u) = e u . 

In this section we consider f(u) = u^ +1 , (3 > and refer to the resulting /-divergence as 
polynomial- divergence. That is, we focus on 

It is easy to see using Jensen's inequality that 

mm / — d\x 

veTMJ \dfxj 

is achieved by v = \i. 

Our optimization problem Oa(/3) may be stated as: 

r fdvV +1 , 

mm / -— au (8) 

subject to pi) and Q. Minimizing polynomial-divergence can also be motivated through utility 
maximization considerations. See [17] for further details. 

Remark 1 Relation with Relative Tsallis Entropy and Relative Renyi Entropy: Let a and 7 
be a positive real numbers. The relative Tsallis entropy with index a of v w.r.t. /x equals 

(d"\ a — 1 

S a (v II u) := / — — <zV, 

J 01 

if i/ is absolutely continuous w.r.t. \i and the integral is finite. Otherwise, S a (u || /i) = 00. See, 
e.g., [12] ■ The relative Renyi entropy of order 7 of v w.r.t. /i equals 

#>M:=^log(/(^) A/ J when 7 ^l 



and 



^(HI^):=/log(^)^, 



if i/ is absolutely continuous w.r.t /i and the respective integrals are finite. Otherwise, i/ 7 (^ || 
fi) = 00 (see, e.g, |3S|). 

It can be shown that as 7 — > 1 , /^(f || /i) — > H±(u \\ (/,) = D{y || \i) and as a 4 
, Sa{y || //) — > D[y || //). Also, following relations are immediate consequences of the above 
definitions: 

Ip(v || fj) = 1 +/3Sp(v || /i), 
J /? ( I /|| jU ) = e ^+i«"llM) 

and 

IeO||/i)-l 
hm — = L)(i/ u . 

Thus, polynomial-divergence is a strictly increasing function of both relative Tsallis en- 
tropy and relative Renyi entropy. Therefore minimizing polynomial-divergence is equivalent to 
minimizing relative Tsallis entropy or relative Renyi entropy. 

9 



In the following assumption we specify the form of solution to 3 (j3): 
Assumption 2 There exist A, > for i = 1, . . . , k\, and A^+i, ..., A& G K such that 



l+/3^2\igi>0 a.e.(/i). 
i=i 

1 + /3^A^J d/i< 



oo, 



and the probability measure v x given by 



u\A) 



[i + PYLi^ia&dn 



(9) 



(10) 



;n) 



for all A E T satisfies the constraints p|) and Oj. Furthermore, the complementary slackness 
conditions 



K(ci - / gidu) = 0, 
hold for i = 1, . . . , k\. 

Theorem 2 Under Assumption^ u 1 is an optimal solution to C>2(/3). 



Remark 2 In many cases the inequality tel) can be equivalently expressed as a finite number 
of linear constraints on Aj's. For example, if each g^ has 1R + as its range then clearly pi) is 
equivalent to A, > for all %. As another illustration, consider g t = (x — Aj) + for % — 1,2,3 
and K 1 < K 2 < K 3 . Then d9|) is equivalent to 



l+pX^^-Kt) > 0, 
l + p\ 1 (K 3 -K 1 ) +p\i(K 3 -K 2 ) > 0, 

A1 + A2 + A3 > 0. 



Note that if each gi has (m + l)-th moment finite, then (10) holds for /3 



m ' 



Existence and Uniqueness of Lagrange multipliers: To obtain the optimal distribution 



by formula (11), we must solve the set of k nonlinear equations given by: 

1, 

1 9j ( 1 + /3E?=i A ^)' 3 d P 

j = Cj for j = 1, 2, . . . , k 

f (i + P Eti X i9i) " dfx 
for Ai, A 2 , • • • , Afc. In view of Assumption [2] we define the set 

A:= {(Ai,A 2 ,...,A fe ) G R k : g and |Io) hold.}. 



(12) 



A solution A := (Ai, A2, • • • , Afc) to (12) is called feasible if lies in the set A. A feasible solution 



A is called strongly feasible if it further satisfies 

k 

i + pY, XlCl > °- 



(13) 



1=1 



Theorem 3 states sufficient conditions under which a strongly feasible solution to ( 12 ) is unique. 
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Theorem 3 Suppose that the variance- covariance matrix of g\,gi,- ■ ■ -,9k under any measure 
v G V(fi) is positive definite, or equivalently, that for any v G V{\i), J2i=i a i9i — ca.e. (v), for 
some constants c and a±, 02, - - - , a* implies c = and a,i = /or alii — 1,2, ... ,k. Then, if a 



strongly feasible solution to (12) exists, it is unique 



To solve (12) for A G A one may resort to numerical root-search techniques. A wide variety 
of these techniques are available in practice (see, for example, [32] and [36J). In our numerical 
experiments, we used FindRoot routine from Mathematica which employs a variant of secant 
method. 

Alternatively, one can use the dual approach of minimizing a convex function over the set A. 
In the proof of Theorem [3] we have shown one convex function whose stationary point satisfies 



a set of equations which is equivalent and related to (12). Therefore, it is possible recover a 



strongly feasible solution to (12) if one exists. Note that, contrary to the case with /-divergence 
where the dual convex optimization problem is unconstrained, here we have a minimization 
problem of a convex function subject to the constraints (J9l) and (13). 



3.2 Single Random Variable, Single Constraint Setting 



Proposition^ below shows the existence of a solution to C>2(/3) under a single random variable, 
single constraint settings. We then apply this to a few specific examples. 

Note that for any random variable X, a non-negative function and a positive integer n, 
we always have E[g(X) n+l ] > E[g(X)]E[g(X) n ], since random variables g(X) and g{X) n are 
positively associated and hence have non-negative covariance. 

Proposition 2 Consider a random variable X with pdf f , and a function g : IR + — > R + 
such that E[g(X) n+1 ] < 00 for a positive integer n. Further suppose that E[g(X) n+1 ] > 
E[g(X)]E[g(X) n ]. Then the optimization problem: 

1+1/n 

min / ) f(x)dx 

hv(f)Jo 

subject to: 

E[g{X)\ " 

has a unique solution for a G ( 1, E , }x)]e\ TxW\ ) ' 9^ ven ^V 

(l + -g(x)) n f(x) 

f x) = ^ n ' x > 

where X is a positive root of the polynomial 

E n ~ k (f) i E l9(X) k+1 } - aE[g(X)}E[g(X) k ] } X k = 0. (14) 

As we note in the proof in the Appendix, the uniqueness of the solution follows from 
Theorem [3j Standard methods like Newton's, secant or bisection method can be applied to 




numerically solve (14) for A 
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Example 3 Suppose X is log-normally distributed with parameters (/i, a 2 ) (that is, log A has 
normal distribution with mean \x and variance a 2 ). Then, its density function is 



/(*) 



exp{ — — } for x > 0. 



x\/2tuj 2 



2a 2 



For the constraint 



E(X) 
E(X) 



i/. 



(15) 



first consider the case B = - = 1. Then, the probability distribution minimizing the 
polynomial- divergence is given by: 



~ (1 + \x)f(x) . 

J[X) = — ; „,„,, X > (J. 



1 + \E{X) 



From the constraint equation we have: 
E(X) 1 



( ,, x U( w g(jO + A£(A 2 ) 
£(A) £(A) + A£(A) 2 7 Xl + J/W £(A) + A£(X) 2: 



or 



A 



a — 1 



£(A)(a-l) 
£(X 2 ) - a£(A) 2 ~ eM+- 2 /2( e <x 2 _ a )- 



Si nce gon+AffPO^ increases with A and converges to g [_yJ = e CT as A — > oo , it follows that 
our optimization problem has a solution if a G [1, e CT ). Thus if a > e a and 3 = 1, Assumption 

771/ yn+1 ^2 1 

2 cannot hold. Further, it is easily checked that E <x)E(x n ) = ^ ■ Therefore, for /3 = -, a 
solution always exists for a G [1, e na ). 

Example 4 Suppose that rv X has a Gamma distribution with density function 

f(x) = ^77—, for x > 



r« 



E(X n+1 ) 



and as before, the constraint is given by (15). Then, it is easily seen that E i X )E(x n ) = ^ + «• 
so that a solution with 3 = - exists for a G 1,1 + -). 

Example 5 Suppose X has a Pareto distribution with probability density function: 

a — 1 



/(*) 



(x + r 



for a; > 



and as before, the constraint is given by (15). Then, it is easily seen that 

E[X n+l ] (a-n-l)(a-2) 
E[X]E[X n ] ~ (a - n - 2)(a - 1)' 

As in previous examples, we see that a probability distribution minimizing the polynomial- 



divergence with 3 = - with n < a — 2 exists when a G 



(a-n-l)(a-2) 
-1 (a-n-2)(a-l) 
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3.3 Weighted Least Squares Approach to find Perturbed Solutions 

ution to Oa(y9) may not exist when Assumption [2] does not hold. For 



Note that an optimal so 

it may not exist for a > E , ix\\e\ (xW\ - ^ n ^ na ^ case ' selecting / and 
associated A so that J ( X j is very close to E , }x)]e\ (xW\ 1S a reas onable practical strategy. 

We now discuss how such solutions may be achieved for general problems using a weighted 
least squares approach. Let A denote (Ai, A 2 , . . . , A&) and v\ denote the measure defined by 



right hand side of (11 ). We write c for (ci, C2, . . . , Ck) and re-express the optimization problem 



C>2(/3) as Oa(y9, c) to explicitly show its dependence on c. Define 

C:=<cGlR fe :3AeA with / gj du^ = Cj > . 

Consider c ^ C. Then, 0%(fi, c) has no solution. In that case, from a practical viewpoint, it is 
reasonable to consider as a solution a measure v\ corresponding to a c' G C such that this c' 
is in some sense closest to c amongst all elements in C. To concretize the notion of closeness 
between two points we define the metric 



d(x,y) = Y^ 



{xi - Vtf 



Wi 

1=1 



between any two points x = (x\,X2, ■ ■ ■ , Xk) and y = (yi,y2, ■ ■ ■ , Vk) in ^ k , where w = 
(wi, W2, . . . , Wk) is a constant vector of weights expressing relative importance of the constraints 
: Wi > and £^ =1 w% = 1- 

Let c := arginf c ; e( j(i(c, c'), where C is the closure of C. Except in the simplest settings, C 
is difficult to explicitly evaluate and so determining c is also no-trivial. 

The optimization problem below, call it C>2(/3,£,c), gives solutions (A t ,y t ) such that the 
vector Ci defined as 

C* := ( / 9i du Xt , / 92 dvxt, ..., g k du 

has (i-distance from c arbitrarily close to d(co, c) when t is sufficiently close to zero. From 
implementation viewpoint, the measure V\ t may serve as a reasonable surrogate for the optimal 
solution sought for 2 (/3,c). (Avelleneda et al. j3j implement a related least squares strategy 
to arrive at an updated probability measure in discrete settings). 

(16) 
AeA.y \J \ "A 4 / v ~ Wi 

subject to 

(17) 

This optimization problem penalizes each deviation t/j from Cj by adding appropriately 
weighted squared deviation term to the objective function. Let (A t ,y t ) denote a solution to 
Oa(/3, t, c). This can be seen to exist under a mild condition that the optimal A takes values 
within a compact set for any t. Note that then c + y t G C. 

Proposition 3 For any c G M fc , the solutions (A t ,y t ) to 2 (/3,t,c) satisfy the relation 

lim d(c, c + y t ) = d(c,c ). 
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min 

\eA,y 


[!(2r*+\£i 


/ 9j di 


X = c i + Vi for j = 1, 2, . . . , fc 



4 Incorporating Constraints on Marginal Distributions 

Next we state and prove the analogue of Theorem 1 and 2 when there is a constraint on a 
marginal distribution of a few components of the given random vector. Later in Remark [3] we 
discuss how this generalizes to the case where the constraints involve moments and marginals 
of functions of the given random vector. Let X and Y be two random vectors having a law // 
which is given by a joint probability density function /(x, y). Recall that V(^) is the set of 
all probability measures which are absolutely continuous w.r.t. fi. If v G V(ji) then v is also 
specified by a probability density function say, /(•), such that /(x, y) = whenever /(x, y) = 
and j^ = i. In view of this we may formulate our optimization problem in terms of probability 
density functions instead of measures. Let V(f) denote the collection of density functions that 
are absolutely continuous with respect to the density /. 

4.1 Incorporating Views on Marginal Using /-Divergence 

Formally our optimization problem O3 is: 



mm 



I l °9\-r) dv= min / lo § T( ' \ f(x,y) d * d y> 
J \d^J fer(f)J v/(x,y)/ 



subject to: 



f f(x,y)dy = g(x) for all x, (18) 

where g(x) is a given marginal density function of X, and 



fn(x, y)/(x, y)dxdy = a (19) 



x,y 



for % = 1,2 ... ,k. For presentation convenience, in the remaining paper we only consider 



equality constraints on moments of functions (as in (19)), ignoring the inequality constraints. 
The latter constraints can be easily handled as in Assumptions (1) and (2) by introducing 
suitable non-negativity and complementary slackness conditions. 

Some notation is needed to proceed further. Let A = (Ai, A2, • • • , \k) and 

, / ,x , = exp(^Aifei(x,y))/(y|x) = exp(^ Ajfej(x,y))/(x,y) 
A ' J y exp(2^ i Ai/ii(x,y))/(y|x)(iy f y exp(^ Ai/ii(x,y))/(x,y)dy ' 

Further, let /\(x, y) denote the joint density function of (X, Y), /^(y|x) x g(x) for all x, y, 
and E^ denote the expectation under f\. 

For a mathematical claim that depends on x, say S(x), we write S(x) for almost all 
x w.r.t. g(x)dx to mean that m g (x\ S(x) is false) = 0, where m g is the measure induced by 
the density g. That is, m g (A) = f A g(x) dx for all measurable subsets A. 

Assumption 3 There exists AeM* such that 

/ exp(^Ai/ii(x,y))/(x,y)rfy < 00 

for almost all x w.r.t p(x)dx and the probability density function f\ satisfies the constraints 



given by (19). That is, for all i = 1, 2, ..., k, we have 

E x [hi(X,Y)]=a. (20) 
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Theorem 4 Under Assumption^ f\{') ^ s an optimal solution to O3. 

In Theorem [5j we develop conditions that ensure uniqueness of a solution to O3 once it 
exists. 

Theorem 5 Suppose that for almost all x w.r.t. g(x)dx ; conditional on X = x, no non- 
zero linear combination of the random variables /ii(x, Y), ^(x, Y), . . . , h k (x, Y) /ias zero vari- 
ance w.r.t. t/ie conditional density /(y|x), or, equivalently, for almost all x w.r.t. g(x)dx, 
Y^ i ctihi('x., Y) = c almost surely (f(y\x)dy) for some constants c and a%, 02, ■ ■ ■ , a k implies 



c = and aj = /or a/H = 1, 2, . . . , /c. Then, if a solution to the constraint equations (20) 
exists, it is unique. 

Remark 3 Theorem [4], as stated, is applicable when the updated marginal distribution of a 
sub-vector X of the given random vector (X,Y) is specified. More generally, by a routine 
change of variable technique, similar specification on a function of the given random vector can 
also be incorporated. We now illustrate this. 

Let Z = (Zx, Z 2 , . . . , Zn) denote a random vector taking values in S C M. N and having a 
(prior) density function f z . Suppose the constraints are as follows: 

• (t>i(Z), t>2(Z), . . . ,v kl (Z)) have a joint density function given by g(-). 

• E[v kl+ i{Z)] = d, E[v kl+2 {Z)} = c 2 , . . . , E[v k2 (Z)} = Cfca-fc, 

where < k\ < k 2 < N and Vx(-), v 2 (-), ■ ■ ■ , v k% {-) are some functions on S. 

If k 2 < N we define N — k 2 functions v k2+ i(-), v k2+2 (-), . . . , Vn(-) such that the function 
v : S — > M. N defined by v(z) = (vi(z) , v 2 (z) , . . . ,vn(z)) has a nonsingular Jacobian a.e. That 
is, 

J(z) := det I I -—^- J J 7^ for almost all z w.r.t. fz, 

where we are assuming that the functions fi(-), v 2 (-), . . . , v k2 (-) allow such a construction. 

Consider X = (Xi, . . . , X kl ), where X« = fj(Z) for i < k\ and Y = (Yi, . . . , Y^^ k ^), where 
Yi = v kl+ i(Z) for i < N — k\. Let /(•, •) denote the density function of (X, Y). Then, by the 
change of variables formula for densities, 

/(x,y) = /zHx,y))[JHx,y))]- 1 , 

where w(-) denotes the local inverse function of r(-), that is, if r(z) = (x, y), then, z = 
ty(x,y). 

The constraints can easily be expressed in terms of (X, Y) as 

X have joint density given by g(-) 

and 

E[Y i ] = c i fori = l } 2 } ... } (k 2 -k l ). (21) 

Setting k = k 2 — ki, from Theorem Q it follows that the optimal density function of (X, Y) 

as: 

e \iyi+\ 2 y2-\ \-XkVk f( x y\ 

f\{X>y) = J e \ 1 y 1 +\ 2 y a +-+\ k y k ffa y ) ( ly X 9W » 

where A^'s is chosen to satisfy plj ). 

Again by the change of variable formula, it follows that the optimal density of Z is given 
by: 

/z(z) = /^(ri(z),r 2 (z),...,w iV (z))J(z) . □ 
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4.2 Incorporating Constraints on Marginals Using Polynomial- 
Divergence 

Extending Theorem [4] to the case of polynomial-divergence is straightforward. We state the 
details for completeness. As in the case of /-divergence, the following notation will simplify 
our exposition. Let 



/A l/3 (yl x ) 



1 +P{i$y?: j >*h j {x,y)y f(y\x 



Jy( 1 + KS) £iAA(*,y)J /(yl x ^y 



ly f 1 +P (fi)^i A A(x,y)) ' /(x,y)dy 



If the marginal of X is given by g(x) then the joint density /\ „(y|x) x g(x) is denoted by 
/^ a(x, y). -E^ « denotes the expectation under f\J-)- 
Consider the optimization problem 4 (/3): 

min / [ ' /(x,y)dxdy , 

/eP(/)J \/(x,y)/ 



subject to (18) and (19) 



Assumption 4 There exists A G lR fc sitc/i t/iat 



lH -^(fH)"E A A(x,y)>o, (22) 

/or almost all (x, y) w.r.i /(y|x) x g(x)dydx and 

l( 1 + P {j$)) E A A'(x,y)J /3 /(x,y)rfy<oo, 



(23) 



/or almost all x iw.r.i o(x)rfx. Further, the probability density function /^ „(x, y) satisfies the 



constraints given by (19). That is, for all i = 1,2, ...,k, we have 

E XfP [h t (X,Y)] = c i . (24) 

Theorem 6 Under Assumption U\ f\g{') ^ s an optimal solution to 0±(f3). 



Analogous to the discussion in Remark ([3]), by a suitable change of variable, we can adapt 
the above theorem to the case where the constraints involve marginal distribution and/or 
moments of functions of a given random vector. 
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We conclude this section with a brief discussion on uniqueness of the solution to 4 (/3). 



Any A G K satisfying (22), (23) and (24) is called a feasible solution to C>4(/3). A feasible 



solution A is called strongly feasible if it further satisfies 

(f ( \\ @ 
—— I > A,c,- > for almost all x w.r.t. o(x)dx. 
s(x); y 

The following theorem can be proved using similar arguments as those used to prove The- 
orem 131 We omit the details. 

Theorem 7 Suppose that for almost all x w.r.t. (gr(x)dx), conditional on^K. = x ; no non-zero 
linear combination of /ii(x, Y), /t2(x, Y), . . . , /ifc(x, Y) /ias zero variance under any measures 
v absolutely continuous w.r.t /(y|x). Or equivalently, that for any measures v absolutely con- 
tinuous w.r.t /(y|x), ^2 i=1 ciihi(x.,Y) = c almost everywhere (v), for some constants c and 
ai, a 2 , . . . , ak, implies c = and Oj = for all % = 1,2, ... ,k. Then, if a strongly feasible 



solution to (24) exists, it is unique. 



Note that when Assumptions 3 and 4 do not hold, the weighted least squares methodology 
developed in Section 3.3 can again be used to arrive at a reasonable perturbed solution that 
may be useful from implementation viewpoint. 

5 Portfolio Modeling in Markowitz Framework 

In this section we apply the methodology developed in Section 4.1 to the Markowitz framework: 
namely to the setting where there are N assets whose returns under the 'prior distribution' are 
multivariate Gaussian. Here, we explicitly identify the posterior distribution that incorporates 
views/constraints on marginal distribution of some random variables and moment constraints 
on other random variables. As mentioned in the introduction, an important application of 
our approach is that if for a particular portfolio of assets, say an index, it is established that 
the return distribution is fat-tailed (specifically, the pdf is a regularly varying function), say 
with the density function g, then by using that as a constraint, one can arrive at an updated 
posterior distribution for all the underlying assets. Furthermore, we show that if an underlying 
asset has a non-zero correlation with this portfolio under the prior distribution, then under the 
posterior distribution, this asset has a tail distribution similar to that given by g. 

Let (X, Y) = (Xi, X 2 , . . . , X N _ k , Y\, Y 2 , . . . , Y k ) have a iV dimensional multivariate Gaus- 
sian distribution with mean n = (/x x , /u ) and the variance-covariance matrix 

•^yx ^yy 

We consider a posterior distribution that satisfies the view that: 

X has probability density function o(x) and E(Y) = a, 

where o(x) is a given probability density function on IR^ - ^ with finite first moments along each 
component and a is a given vector in IR fc . As we discussed in Remark k^ (see also Example w 
in Section 7), when the view is on marginal distributions of linear combinations of underlying 
assets, and/or on moments of linear functions of the underlying assets, the problem can be 
easily transformed to the above setting by a suitable change of variables. 
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To find the distribution of (X, Y) which incorporates the above views, we solve the mini- 
mization problem O5: 



feP(f) >/(x,y)e 
subject to the constraint: 



min / log ' /(x, y) dxdy 



f /(x,y)dy = ff (x) Vx 

</yeK fc 



and 



/ / y/(x,y)^yrfx = a, (25) 

</xeR JV - fc </yeR fc 

where /(x, y) is the density of iV-variate normal distribution denoted by A/jv(/u, S). 

Proposition 4 Under the assumption that £ xx is invertible, the optimal solution to O5 zs 
g'zven fry 

/(x,y) = <7(x)x/(y|x) (26) 

where /(y|x) is i/ie probability density function of 

M k (a + £ yx £ xx (x - £ S [X]) , £ yy - £ yx £ xx £ xy ) 
where E g (X) is the expectation of X under the density function g. 

Tail behavior of the marginals of the posterior distribution: We now specialize to 
the case where X (also denoted by X) is a single random variable so that iV = k + 1, and 
Assumption [B] below is satisfied by pdf g. Specifically, (X, Y) is distributed as A4+i(a*, ^) with 



V 3*y ^yy / 

where cr^ = (o- xyi ,a xy2 ,...,a X y h ) T with 0^ = Coi;(X, Yj). 

Assumption 5 17ie pdf g(-) is regularly varying, that is, there exists a constant a > 1 (a > 1 
z's needed for g to be integrable) such that 

t^oo g(t) rf 
for all r] > (see, /or instance, \W$). In addition, for any aeR and b G M + 

g{b{t- s-a)) 



g(t) 



< h(s) (27) 



for some non-negative function h(-) independent oft (but possibly depending on a and b) with 
the property that Eh(Z) < 00 whenever Z has a Gaussian distribution. 

Remark 4 Assumption [5] holds, for instance, when g corresponds to ^-distribution with n 
degrees of freedom, that is, 

-n/n+l\ 2 

g ( s )= [ 2 j n + g-H 5 ^) 



/^Fr(f) v n 
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Clearly, g is regularly varying with a = n + 1. To see (27), note that 



g(b(t-8-a)) (l + t 2 /n)( n+1 )/ 2 



g{t) (l + fo 2 (i-s-a) 2 /n)( n+1 )/ 2 

Putting t' = -r=, s' = (s t- a) and c = £ we have 

(1 + t 2 /n) _ 1 + c¥ 2 

(l + b 2 (t-s-a) 2 /n) " ! + (£'- s') 2 ' 



Now (27) readily follows from the fact that 



1 4- r 2 t' 2 

' l ^ r 1 2 1 i 2/2, 2 1 / 1 

— ^ < maxjl, c } + c s +c|s| 



1 + (f - s') 2 

for any two real numbers s' and £'. To verify the last inequality, note that if t' < s' then 
ijgj^a < 1 + cV 2 and if £' > s' then 

l + c 2 t' 2 l + c^'-s' + s') 2 lH-c^'-s') 2 o , 2 2/ 2(tf-s / ) 



l + (t'-s') 2 l + (t'-s') 2 l + (t'-s') 2 l + (t'-s') 2 

^ M 2i 2/2, 2|/| 

< max{l, c } + c s + c \s \. 

Note that if h(x) = x m or /i(rr) = exp(Ax) for any m or A then the last condition in Assumptionp) 
holds. 

From Proposition Q, we note that the posterior distribution of (X,Y) is 

f(x,y) = g(x) x f(y\x) 

where f(y\x) is the probability density function of 

.. / (x-E g {X)\ 1 t \ 

Mk [ a + ( ) (T xy , 2J yy - — - CT xy (T xy , 



@ XX J V . 



XX 



where E g (X) is the expectation of X under the density function g. Let fy 1 denote the marginal 
density of Y\ under the above posterior distribution. Theorem [8] states a key result of this 
section. 



Theorem 8 Under Assumption^ if o~ xyi ^ 0, th 



fvAs) { a xyi 



en 

a-l 



lim ^ = ^ . (28) 

s^oo g(s) \a xx 



Note that (28) implies that 



lim 



J x fY 1 (s)ds = fa, 
I x 9(s)ds 



'xyi 



a-l 
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6 Comparing Different Objectives 

Given that in many examples one can use /-divergence as well as polynomial-divergence as an 
objective function for arriving at an updated probability measure, it is natural to compare the 
optimal solutions in these cases. Note that the total variation distance between two probability 
measures /i and v defined on (Q, J 7 ) equals 

sup{/j(A) -u(A)\Ae J 7 }. 

This may also serve as an objective function in our search for a reasonable probability measure 
that incorporates expert views and is close to the original probability measure. This has an 
added advantage of being a metric (e.g., it satisfies the triangular inequality). 

We now compare these three different types of objectives to get a qualitative flavor of 
the differences in the optimal solutions in two simple settings (a rigorous analysis in general 
settings may be a subject for future research). The first corresponds to the case of single random 
variable whose prior distribution is exponential. In the second setting, the views correspond 
to probability assignments to mutually exclusive and exhaustive set of events. 

6.1 Exponential Prior Distribution 

Suppose that the random variable X is exponentially distributed with rate a under \x. Then 
its pdf equals 

f( x ) = ae - ax , x>0. 

Now suppose that our view is that under the updated measure v with density function /, 
E(X) = Jxf(x)dx= 1 ->l 

/-divergence: When the objective function is to minimize /-divergence, the optimal solu- 
tion is obtained as an exponentially twisted distribution that satisfies the desired constraint. It 
is easy to see that exponentially twisting an exponential distribution with rate a by an amount 
9 leads to another exponential distribution with rate a — 9 (assuming that 9 < a). Therefore, 
in our case 

f( x ) = 7e -^, x > 0. 

satisfies the given constraint and is the solution to this problem. Note here that the tail distri- 
bution function equals exp(— 72;) and is heavier than exp(— ax), the original tail distribution 
of X. 

Polynomial-divergence: Now consider the case where the objective corresponds to a 
polynomial- divergence with parameter equal to j3, i.e, it equals 



f(x 




f(x) 
Under this objective, the optimal pdf is 



/(*) 



f(l + f3\xy/?ae- ax dx'' 



where A > is chosen so that the mean under / equals -. 
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While this may not have a closed form solution, it is clear that on a logarithmic scale, f(x) 
is asymptotically similar to exp(— ax) as x — > oo and hence has a lighter tail than the solution 
under the /-divergence. 

Total variation distance: Under total variation distance as an objective, we show that 
given any e, we can find a new density function / so that the mean under the new distribution 
equals - while the total variation distance is less than e. Thus the optimal value of the objective 
function is zero, although there may be no pdf that attains this value. 

To see this, consider, 



/>) 



£ l(a-8,a+8) 

2 X ' 26 



1 — — J or 



Then, 

E(X) 

Thus, given any e, if we select 



xf(x) dx 



for x > 0. 



1 - - 

x 2 



n 



l _ l 

7 a 

e 

2 



we see that 



E(X) 



1 

+ - 
a 



1 - ^ 

x 2 



O 



1 
1 



We now show that total variation distance between / and / is less than e. To see this, note 
that 

/ f(x)dx — / f(x)dx 
'a J a 



<[ 2) P{A) 



for any set A disjoint from (a — S, a + S), where the probability P corresponds to the density 
/. Furthermore, letting L(S) denote the Lebesgue measure of set S, 



f(x)dx — / f(x)dx 

A J A 



<,£) H A) + (V)PW 



for any set A C (a — S, a + S). Thus, for any set Ac (0, oo 

e 



f(x)dx — / f{x)dx 

A J A 



< { -) L(An(a-5,a + 8))+ (-) P(A). 



Therefore, 



sup 

A 



f(x)dx — \ f(x)dx 



<l >» + (s, IM<: - 



This also illustrates that it may be difficult to have an elegant characterization of solutions 
under the total variation distance, making the other two as more attractive measures from this 
viewpoint. 

6.2 Views on Probability of Disjoint Sets 

Here, we consider the case where the views correspond to probability assignments under pos- 
terior measure v to mutually exclusive and exhaustive set of events and note that objective 
functions associated with /-divergence, polynomial-divergence and total variation distance give 
identical results. 
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Suppose that our views correspond to: 

v(Bi) — «j, i — 1, 2, ...k where 

k 

B[s are disjoint, U-B,, = Vt and 2_, ai = ^ 

For instance, if L is a continuous random variable denoting loss amount from a portfolio 
and there is a view that value-at-risk at a certain amount x equals 1%. This may be modeled 
as v{L >x} = 1% and v{L < x} = 99%. 

/-divergence: Then, under the /-divergence setting, for any event A, the optimal 

Select X{ so that e A * = a*/ n{Bj). Then it follows that the specified views hold and 

z/(A) = ^ o^(^ n Bi)/n{Bi). (29) 



Polynomial-divergence: The analysis remains identical when we use polynomial- 
divergence with parameter /3. Here, we see that optimal 



E,(i + ^«)^(fi. 



Again, by setting (1 + /3A,) 1//3 = ati/nfa), (29) holds. 



Total variation distance: If the objective is the total variation distance, then clearly, the 



objective function is never less than maxj \fi(Bi) — a>i\. We now show that v defined by (29) 
achieves this lower bound. 
To see this, note that 



HA)-^A)\ 



J2 HA n Bi) - fi(A n Bi)) 

i 
i 

ELi(AnBi). ,„.. 
, e W a, "" (Bdl 

< max \n(Bj) — «i|. 

i 

7 Numerical Experiments 

Three simple experiments are conducted. In the first, we consider a calibration problem, 
where the distribution of the underlying Black-Scholes model is updated through polynomial- 
divergence based optimization to match the observed options prices. We then consider a 
portfolio modeling problem in Markowitz framework, where VAR (value-at-risk) of a portfolio 
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consisting of 6 global indices is evaluated. Here, the model parameters are estimated from 
historical data. We then use the proposed methodology to incorporate a view that return from 
one of the index has a t distribution, along with views on the moments of returns of some linear 
combinations of the indices. In the third example, we empirically observe the parameter space 
where Assumption 2 holds, in a simple two random variable, two constraint setting. 

Example 6 Consider a security whose current price Sq equals 50. Its volatility o is estimated 
to equal 0.2. Suppose that the interest rate r is constant and equals 5%. Consider two 
liquid European call options on this security with maturity T = 1 year, strike prices K\ = 
55, andi^ 2 = 60, and market prices of 5.00 and 3.00, respectively. It is easily checked that the 
Black Scholes price of these options at a = 0.2 equals 3.02 and 1.62, respectively. It can also 
be easily checked that there is no value of a making two of the Black-Scholes prices match the 
observed market prices. 

We apply polynomial-divergence methodology to arrive at a probability measure closest to 
the Black-Scholes measure while matching the observed market prices of the two liquid options. 
Note that under Black-Scholes 

S(T) ~ log-normal (log 5(0) + (r - y)T, a 2 T J = log-normal (log 50 + 0.03, 0.04) 

which is heavy-tailed in that the moment generating function does not exist in the neighborhood 
of the origin. Let / denote the pdf for the above log-normal distribution. We apply Theorem 2 
with — 1 to obtain the posterior distribution: 

f ~ {x) (1 + Ai(a; - ATi)+ + X 2 (x - K 2 )+) f(x) 



/ °° (1 + Ai(;r - K x )+ + X 2 (x - K 2 )+) f(x) dx 
where Ai and X 2 are solved from the constrained equations: 

E[e~ rT (S(T) - K x ) + ] = 5.00 and E[ e - rT {S{T) - K 2 ) + ] = 3.00. (30) 

The solution comes out to be Ai = 0.0945945 and X 2 = —0.0357495 (found using FindRoot 
of Mathematica). Note that Ai > — K \ K = —0.2 and Ai + X 2 > 0. Therefore these values are 
all feasible. Furthermore, since Ai x 5 + X 2 x 3 > 0, they are strongly feasible as well. Plugging 
these values in /(■) we get the posterior density that can be used to price other options of the 
same maturity. The second row of Table [T] shows the resulting European call option prices for 
different values of strike prices under this posterior distribution. 

Now suppose that the market prices of two more European options of same maturity with 
strike prices 50 and 65 are found to be 8.00 and 2.00, respectively. In the above posterior 
distribution these prices equal 7.6978 and 1.6779, respectively. To arrive at a density function 
that agrees with these prices as well, we solve the associated four constraint problem by adding 



the following constraint equations to (30): 



E[e~ rl (S(T) - K ) + ] = 8.00 and E[e~ rl (S(T) - K 3 ) + ] = 2.00, (31) 

where K = 50 and K 3 = 65. 

With these added constraints, we observe that the optimization problem C>2(/3) lacks any 



solution of the form ( 11 ). We then implement the proposed weighted least squares methodology 
to arrive at a perturbed solution. With weights w± = w 2 = w 3 = w^ = \ and t = 4 x 10 -3 



so that each tWi equals 10 ), the new posterior is of the form (11) with A-values given by 



Ai = 0.334604, A 2 = -0.445519, A 3 = -0.0890854 and A 4 = 0.409171. The third row of Table [T] 
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Strike 


50 


55 


60 


65 


70 


75 


80 


BS 


5.2253 


3.0200 


1.62374 


0.8198 


0.3925 


0.1798 


0.0795 


Posterior I 


7.6978 


5.0000 


3.0000 


1.6779 


0.8851 


0.4443 


0.2139 


Posterior II 


8.0016 


4.9698 


3.0752 


1.9447 


1.15306 


0.63341 


0.3276 


Posterior III 


8.0000 


4.9991 


3.0201 


1.8530 


1.0757 


0.5821 


0.2977 


Posterior IV 


7.7854 


4.8243 


3.0584 


1.9954 


1.2092 


0.6743 


0.3525 



Table 1: Option prices for different strikes as computed by the Black Scholes and different 



posterior distributions of the form given by (11). Here, BS stands for Black Scholes price at 
a = 0.2. Posterior I is the optimal distribution corresponding to the two constraints (30). 



Posterior II (resp. Ill, and IV) is the posterior distribution obtained by solving the perturbed 
problem with equal weights (resp., increasing weights and decreasing weights) given to the four 



constraints given by (30) and (31). 



gives the option prices under this new posterior. The last two rows report the option prices 
under the posterior measure resulting from weight combinations tw = (10 -6 , 10 -5 , 10~ 4 , 10~ 3 ) 
and tw = (10- 3 , 10~ 4 , 10" 5 , 10" 6 ), respectively. 

Example 7 We consider an equally weighted portfolio in six global indices: ASX (the 
S&P/ASX200, Australian stock index), DAX (the German stock index), EEM (the MSCI 
emerging market index), FTSE (the FTSE100, London Stock Exchange), Nikkei (the Nikkei225, 
Tokyo Stock Exchange) and S&P (the Standard and Poor 500). Let Z%, Z 2 , . . . , Zq denote the 
weekly rate of return^ from ASX, DAX, EEM, FTSE, Nikkei and S&P, respectively. We 
take prior distribution of (Zi, Z 2 , . . . , Z 6 ) to be multivariate Gaussian with mean vector and 
variance- covariance matrix estimated from historical prices of these indices for the last two 
years (161 weeks, to be precise) obtained from Yahoo-finance. Assuming a notional amount 
of 1 million, the value-at-risk (VaR) for our portfolio for different confidence levels is shown in 
the second column of Table [3j 

Next, suppose our view is that DAX will have an expected weekly return of 0.2% and will 
have a ^-distribution with 3 degrees of freedom. Further suppose that we expect all the indices 
to strengthen and give expected weekly rates of return as in Table [2] For example, the third 
row in that table expresses the view that the rate of return from emerging market will be higher 
than that of FTSE by 0.2%. Expressed mathematically: 

E[Z 3 - Z A ] = 0.002, 

where E is the expectation under the posterior probability. The other rows may be similarly 
interpreted. 

We define new variables as X — Z 2 , Y\ — Z-\_, Y 2 — Z 3 — Z±, Y 3 — Z±, Y± — Z 5 and Y 5 = Z e . 
Then our views are E\Y] = (0.1%, 0.2%, 0.1%, 0.2%, 0.1%)* and *-°- 002 has a standard t- 

distribution with 3 degrees of freedom. 

The third column in Table [3] reports VaRs at different confidence levels under the posterior 
distribution incorporating the only views on the expected returns (i.e, without the view that 
X has a ^-distribution) . We see that these do not differ much from those under the prior 
distribution. This is because the views on the expected returns have little effect on the tail: 
the posterior distribution is Gaussian and even though the mean has shifted (variance remains 



5 Using logarithmic rate of return gives almost identical results. 
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Index 


average rate of return 


ASX 


0.001 


DAX 


0.002 


EMM - FTSE 


0.002 


FTSE 


0.001 


Nikkei 


0.002 


S&P 


0.001 



Table 2: Expert view on average weekly return for different indices. 



VaR at 


Prior 


Posterior 1 


Posterior 2 


99% 


8500 


8400 


16000 


97.5% 


7200 


7100 


11200 


95% 


6000 


5900 


8200 



Table 3: The second column reports VaR under the prior distribution. The third reports 
VaR under the posterior distribution incorporating views on expected returns only. The last 
column reports VaR under the posterior distribution incorporating views on expected returns 
as well as a view that returns from DAX are t-distributed with three degrees of freedom. 



the same) the tail probability remains negligible. Contrast this with the effect of incorporating 
the view that X has a t-distribution. The VaRs (computed from 100, 000 samples) under this 
posterior distribution are reported in the last column. 

Example 8 In this example we further refine the observation made in Proposition 2 and the 
following examples that typically the solution space where Assumption 2 holds increases with 
increasing n—\. We note that even in simple cases, this need not always be true. 
Specifically, consider random variables X and Y such that 



logV 
logV 



bivariate Gaussian 








1 P 
P 1 



Then, X and Y are log-normally distributed and their joint density function of (X, Y) is 
given by: 



2nxy-\/l — p 2 
Consider the constraints 



cxp 



2(1 



-{(logo;) 2 + (logy) 2 - 2p(loga;)(logy)} 



E(X) 
E(Y) 



aE(X) 
bE(Y) 



Our goal is to find values of a and b for which the associated optimization problem C>2(/3) has 
a solution. The probability distribution minimizing the polynomial-divergence with (3 = - is 



of the form: 



f(x,y) 



(1 + \x + lyTfjx, y) _ (1 + \x + lyTfjx, y) 
^{l + lx+iyYf{x,y)dxdy E[{1 + ±X + £y)»] 
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Now from the constraint equations we have 

E[x(i + ±x + t Y y 



and 



E[X]E[(l+$X+lY)> 
E[Y (1 + ±X + $¥)»] 



E\Y]E[{1 + $X + $¥)«]' 

Note that since X and Y takes values in (0, oo) only A > and £ > are feasible. Using 
ParametricPlot of Mathematica we plot the values of 

E[X(l + $X + $Y) n ] E[Y(l + $X+lY) n ] 
E[X]E[(1 + ±X + iY)n] ' E[Y]E[(1 + ±X + £y)»] 

for A and £ in the range [0, 10n]. Figure (nj), depicts the range when p = — |, p = 0, p = 
| andp = | respectively, for n = 1, n = 2 andn = 3. 

From the graph it appears that the solution space strictly increases with n when p > 0. 
However, this is not true for p < 0. 

8 Conclusion 

In this article, we built upon existing methodologies that use relative entropy based ideas for 
incorporating mathematically specified views/constraints to a given financial model to arrive 
at a more accurate one, when the underlying random variables are light tailed. In the existing 
financial literature, these ideas have found many applications including in portfolio model- 
ing, in model calibration and in ascertaining the pricing probability measure in incomplete 
markets. Our key contribution was to show that under technical conditions, using polynomial- 
divergence, such constraints may be uniquely incorporated even when the underlying random 
variables have fat tails. We also extended the proposed methodology to allow for constraints 
on marginal distributions of functions of underlying variables. This, in addition to the con- 
straints on moments of functions of underlying random variables, traditionally considered for 
such problems. Here, we considered, both /-divergence and polynomial-divergence as objective. 

We also specialized our results to the Markowitz portfolio modeling framework where mul- 
tivariate Gaussian distribution is used to model asset returns. Here, we developed close form 
solutions for the updated posterior distribution. In case when there is a constraint that a 
marginal of a single portfolio of assets has a fat-tailed distribution, we showed that under 
the posterior distribution, marginal of all assets with non-zero correlation with this portfolio 
have similar fat-tailed distribution. This may be a reasonable and a simple way to incorporate 
realistic tail behavior in a portfolio of assets. 

We also qualitatively compared the solution to the optimization problem in a simple setting 
of exponentially distributed prior when the objective function was /-divergence, polynomial- 
divergence and total variational distance. We found that in certain settings, /-divergence may 
put more mass in tails compared to polynomial-divergence, which may penalize tail deviation 
more. Finally, we numerically tested the proposed methodology on some simple examples. 

Acknowledgement: The authors would like to thank Paul Glasserman for many directional 
suggestions as well as specific inputs on the manuscript that greatly helped this effort. We would 
also like to thank the Associate Editor and the referees for feedbacks that substantially improved 
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9 Appendix: Proofs 

Proof of Proposition FT Recall that f(x) = jfir^ for x > 0, and 

e x ^f(x)I [0M] (x) 
JXm[X) ~ f M e^*f {x )dx 

where Am is the unique solution to J Q xf\(x) dx = c. 

We first show that X M — > as M — > oo. To this end, let 

g (KM) = S \f' !(x)dX \ M>l,X> . 

So e Xx f(x)dx 

We have 

dg (io M x 2 e Xx f{x) dx) (f Q M e Xx f{x) dx) - (/ M xe Xx f{x) dx} 

(>X ( j Q M e Xx f{x)dx 



W 2 I e Xx f{x) \ _,_ ( f M ( e Xx f{x) X 



J M e Xx f(x)dx) dX [Jo X \j Q M e Xx f{x)dx / 
E X [X 2 } - (E x [X]f > 

where E\ is the expectation operator w.r.t density function f\. 
Also 

dg _ {Me XM f{M)) ( J Q M e Xx f{x) dx) - ( f Q M xe Xx f{x) dx) (e XM f(M) - /(0)) 

m = f f M e Xx f(x)dx 



e XM f(M) J M (M - x)e Xx f(x) dx + /(0) j Q M e Xx f(x) dx 

M x ; ' 



e Xx f(x) dx 

Since Am satisfies g(Xj[,f,M) = c, it follows that increasing M leads to reduction in Am- That 
is, Am is a non-increasing function of M. 

Suppose that Am 4 c\ > 0. Then c = ^(Am, M) > g(ci, M) for all M. But since ci > we 
have g(ci, M) — > oo as M — > oo, a contradiction. Hence, Am — ► as M — > oo. 

Next, since 

/ log ( -j-— J / A (x) dx = X xf x (x) dx - log f / e Xx f(x) dx J , 
we see that 



log I ^y^ 1 /a m (x) rfx = A M c - log U e x ^f(x) dx 

Hence, to prove that the LHS converges to zero as M — > oo, it suffices to show that 
C e x ^f(x) dx ^ 1 or that J Q M ^ dx -► ^. 
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Note that the constraint equation can be re-expressed as: 

-M X e x M x j_ fM e x M x j_ pM e x M x 



M f 

xfx M (x)dx 



o (i+xY 



dx /„• 



(1+2+ 



-i ^ Jo 



(1+x) 



dx / 



M e A M a 



(l+z)° 



,2 CLdj 



r M e x M x j 
JO (l+x) a " x 



JO (l+z) Q " x 



!f eHlX dx 



f 

Jo 



(1+x)" 



or 



./; 



(1+as) 



^ ^ Li.// 



M e ^M x 



Jo (1 



(l+x) c 



dx 



1 + c. 



(32) 



Further, by integration by parts of the numerator, we observe that 



M 



,A M :r 



1 + X 



\a— 1 



da? 



1 
Am 



A M M 



;i + m)< 



— - 1 + (a - 1) 

C— 1 v ' 



M e A M s 



(! + *)' 



dx 



From the above equation and (|32|), it follows that 

dx 



M e \ MX 



e x M M 
(1+M) Q ~ 



1 + X Q 



Since A^ — > 0, it suffices to show that 



A M (c + 1) - (a - 1) ' 



(33) 



A M M 



(1 + iifr- 1 



-►o. 



Suppose this is not true 



hen, there exists an rj > and a sequence M, t °° such that 



(i + M) a ~ 1 — T i' Equation (32) may be re-expressed as: 



M 



n X M x 



1+X 



|Q — 1 



1 + X 



dx = 



(34) 



Given an arbitrary K > 0, one can find an M, > 1 + 2c (so that 1 — j^£ > | when x > Mj) 
such that, for x e [Mi - K, M { ] 



-,A M ,x 



1 + X 



10— 1 



> 



e \ M .(Mi-K) 

:i + Afj"- 1 



> 



Re-expressing the LHS of (34) evaluated at M = Mi as 

-Mi-K 



e Hi t x 



1 + x 



\a— 1 



1+C 
1 + X 



dx- 



gAM^ 



1 



(1 + x)^ 1 
we see that this is bounded from below by 

-cV M * c + Kr]/A. 



1 + c 
1 + x 



dx+ 



iU, 



Mi-K 



e A M ^ 



(1 + x) 



a-1 



1 + c" 

1 +x 



dx, 



For sufficiently large i^, this is greater than zero providing the desired contradiction to (34).D 

Proof of Theorem^ Let f = / (1 + (5 £, k9i) 1,P dfi. and A; = (/3 + l)/3A,/^. Consider 
the Lagrangian C{v) for C>2(/3) defined as 



j\£) <fo-X>(/**' 



ci 



1 
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(35) 



We first argue that C{y) is a convex function of v. Given that A; J gi dv are linear in u, it 
suffices to show that J(^) /3+1 dfx is a convex function of v. 
Note that for < s <1, 



d(svi + (1 - s)v 2 ) 
dfi 



/3+1 



dfi 



equals 



which in turn is dominated by 



dui ,du 2 Y +1 , 

s— + (l-s) — dfi 

dfi dfi J 



' (dviY +1 n 


,fdu 2 y +i ] 


s\ -r 1 ) + (1 " 


- s) [ -r^) 


\dfi) 


' \dfij 



which equals 



J U/J 



dfi 



p+1 r f du \ l3+1 

d»+(l-s) P) du. 



V dfi ) 

Therefore, the Lagrangian Civ) is a convex function of v. 

We now prove that C(v) is minimized at v 1 . For this, all we need to show is that we cannot 
improve by moving in any feasible direction away from u 1 . Since, z/ 1 satisfies all the constraints, 
the result then follows. We now show this. 



Let / denote f^ and / 



dfi 



?1 _ du 1 
d/i 



. Note that (35) may be re-expressed as 



/ if"* 1 -Z)W) dfi + J2^i- 



For any v EV(fi) and £ G [0, 1] consider the function 

G v (t)=C{(l-t)v 1 +tv). 
This in turn equals 

{(i - t)f + tfy+ l - J2 W(i - *)/* + */} 



d// + 2jA;Q. 



We now argue that ^ t=Q G u (t) = 0. Then from this and convexity of C, the result follows. 
To see this, note that ^ t=Q G u (t) equals 



J lw+my -J2^i\(f - f 1 )^- 



(36) 



Due to the definition of f\ and Aj, it follows that the term inside the braces in the integrand 
in (36) is a constant. Since both u l and v are probability measures, therefore £_ G v (t) = 
andthe result follows. □ 
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Proof of Theorem [3j Let A denote the set of all strongly feasible solutions to (12). 
Consider the following set of equations: 



T 



Cj for j = l,2,...,k. 



(37) 



We say that 6 = (61,62, . . . ,0*) is a strongly feasible solution to (37) if it solves (37), and 
lies in the set: 

G R k I 1 +/3j2 e i(9i ~6i)>0 a.e.fi, / (1 + 0^0^ - c,))* +1 djx < 00 and/3^0,Q < 1 



Let B denote the set of all strongly feasible solutions to (37) 



Let (j) : A—¥ B and ip : B — > A be the mappings defined by 

Ai A2 



and 



0(A) 



W) 



A, 



L +0£iU W 1 + /?Eti W " ' 1 + /3Eti A/Q ; 
61 62 6k 



It is easily checked that mapping <p is a bijection with inverse ■0. Note that if A G A, then 0(A) G 
-B. To see this, simply divide the numerator and the denominator in ( 12 ) by (1+0 E/=i Azq) 1 ^. 
Conversely, if 6 G -B, then ^>(0) G A. To see this, divide the numerator and the denominator 
in(|37]by(l-/3Ef = iM 1//3 - 



Therefore, its suffices to prove uniqueness of strongly feasible solutions to (37). To this end, 
consider the function G : B — >• M + : 



Then, 



dG 



l + 0Y J 6 l (g l -c l )\ dfi. 
--(1+0) J( 9i - Ci ) (l + 0j2 e i(9i-ci)J dpi. 



(38) 



and 



d 2 G 
d6jd6i 



0(1 + 0) J( 9i - Ci )( 9j - Cj) (1+0^6^ - q) j dpi. 



We see that the last integral can be written as Ee[(gi — Ci)(gj — Cj)] times a positive constant 
independent of i and j, where Eg denotes expectation under the measure 



1 + 0Y,Li him -v. 



J- 1 



dpi 



fli + PZLiOibn-*: 



j-1 



dpi 
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Now from the identity 

Ee[{g% - Ci)(gj - c,-)] = Cov9\g%,9j] + (Eofo] - d) (E e [9j] - Cj) 

and the assumption on g^s it follows that the Hessian of the function G is positive definite. 
Thus, G is strictly convex in its domain of definition, that is in B. Therefore if a solution to 
the equation 

(dGdG dG\ 



exists in B, then it is unique. From (38) it follows that the set of equations given by (39) and 
(37) are equivalent. □ 

Proof of Proposition |3j First assume that c ^ C so that Oa(/3, c) has no solution. Note 
that Co may not belong to C so that Oa(/3, Cq) may also not have a solution. Let B(x, r) denote 
the open ball centered at x and radius r with respect to the metric d defined at Section |3.3| 
For arbitrary s > 0, let c £ 6 B(c ,e) f]C. Then Oa(/3, c e ) has a solution, say v s and let A(e) 
denote the associated parameters. It follows that (A(e), c £ — c) is feasible for 2 (/3, t, c) for any 
t. Since (A 4 ,y t ) is a solution to 2 (/3,t, c), we have 

i=l i=l 

or 

/ /3( z/ A t ||/i) + -d(c,c + y t ) < I/9(i/ 6 ||ju) + -d(c,c e ). 

But, by triangle inequality and definition of c e , it follows that 

d(c, c e ) < d(c, c ) + d(co, c e ) < d(c, c ) + e. 

Therefore, 

^(^A t ||/i) + -rf(c,c + y t ) < 1^(^11//) + -d(c,c ) + -. 

Since, -^(^aJIaO > Owe have 

d(c,c + y t ) < t/^(i/ e ||/*) + d(c,c ) +e. 
Since £ can be chosen arbitrarily small, we conclude that 

d(c,c + y t ) <d(c,c ) + 2£. (40) 

Now, since c + y t e C, by definition of c we have 

d(c,c + y t ) > d(c,c ). 



Together with inequality (40), we have, 

Hmd(c,c + y t ) = d(c,c ). 

If c e C then Oa(/3, c) has a solution. Then the above analysis simplifies: Co = c and each 
c £ can be taken to be equal to c. We conclude that 

lim<i(c, c + y t ) = 
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or y t -»• 0. D 

Proof of Theorem He In view of (18), we may fix the marginal distribution of X to be 
g(x) and re-express the objective as 

min / Iog(&j^)/(y|x)( 7 (x)dydx+ /log(^W)dx. 

/(.|x)67»(/(-|x)),vx7x, y \/(yl x )y A V/( x )/ 

The second integral is a constant and can be dropped from the objective. The first integral 
may in turn be expressed as 

/min \og——-f(y\x)dy)g(x)dx. 

J* /(-|x)e7>(/(-M) \Jy /(yl x ) / 

Similarly the moment constraints can be re-expressed as 

/ hi(x,y)f(y\x)g(x)dxdy = Ci, i = l,2,...,k 

J x,v 



'x,y 

or 



/ f / h i {x,y)f(y\x)dyjg(x)dx = c i , i = l,2,...,k. 



'x Wy 

Then, the Lagrangian for this k constraint problem is, 



/ . min /(lo g (&^)/(y|x)-V^(x,y)/(y|x))dy 

J* [/(-|x)eP(/(.|x)) Jy \ /(y|x) ^ J 

Note that by Theorem (fTl) 



p(x)dx + y^^Cj 



/ ilog^W/(y|x)-^^(x,y)/(y|x))dy 

/(•|x)eP(/(.|x))J y ^ /(y|x) ^ J 

has the solution 

ex P(Ei 5 ^i( x 5 y))/(yl x ) _ exp(^ i 5 i /i i (x,y))/(x,y) 



fr(y| 



X 



J y ex P(Ei <^i( x , y))/(y |x)dy / y exp(£i ^^(x, y))/(x, y)dy ' 



where we write <5 for (Si, 62, ■ ■ ■ , 5jfe). Now taking 5 = A, it follows from Assumption [3] that 
f\(x,y) = /^(y|x)flf(x) is a solution to O3 .D 

Proof of Theorem [5; Let F : lR fe — >■ R be a function defined as 

F(\) = ^ log f y exp f ^A^(x,y) J /(y|x)dyj <?(x)dx- ^A,q. 
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Then, 



OF 
dX, 



I Y ht(x, y)exp (£, A//iz(x, y)) /(y|x)dy ' 
J exp (J2i Xihi(x, y)) /(y|x)dy 



g{x)dx - Ci 



x Wy 



exp(^,Aj/ii(x,y))/(yx) . 

hi(x,y)j ex-* \l( — ^ f / i w ^y h x m-Ci 

J v exp(LiMi( x »y))/(y x )rfy 



/ f / ^(x,y)/ A (y|x)rfyj ^(x)rfx-Ci 
/ / /ii(x,y)/^(x,y)dxdy-Ci 

Jx ./y 

E a [/z,(X,Y)]-q. 



Hence the set of equations given by (20) is equivalent to: 



«9F <9F <9F 



9Ai ' <9A 2 ' ' ' dXi 



Since 



A 

OX 



-f\(y\*) = ^(x,y)/ A (yl x ) - f / ^( x >y)/A(yl x ) rf y) x /A(yl : 



we have 

d 2 F 
OXjdXi 



1(1 ^x,y) — / A (y|x)dyU(x)dx 

^(x,y)fy(x,y)/ A (y|x)dyj p(x)dx 

/i i( x ) y)/A(yl x ) rf y) ( / M x >y)/A(yl x ) rf y)#( x ) dx 



x \Jy 



x \Jy 



(41) 



£ s(x) [^[^(X, Y)hj(X, Y) | X]] - E g(x) [E^iX, Y) | X] x E A [^(X, Y) | X 
^(x) [Cov A [^(X,Y),^-(X,Y) | X 



Where -E s ( x ) denote expectation with respect to the density function g(x). By our assumption, 
it follows that the Hessian of F is positive definite. Thus, the function F is strictly convex in 
K. k . Therefore if there exist a solution to (41), then it is unique. Since (41) is equivalent to 



(20), the theorem follows. □ 



Proof of Theorem [6| Fixing the marginal of X to be g(x) we express the objective as 



mm 



/(y|*)g(x) 

/(■|x)eP(/T-|x)),Vxi x>y y /(y|x)/(x) 
This may in turn be expressed as 



/(y|xMx)dydx. 
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L 



- min I / \jrr\) f(y\*)dy) (^\Y g (*)d*. 

/(•|x)6P(/(.|x)) \ J y \ /(y|x) J } V/(x) 



Similarly, the moment constraints can be re-expressed as 



j(jhi( X ,y)(^j f(y\ x )dy\(&^ y(x)dx = c i , 1 = 1,2,...,*. 

Then, the Lagrangian for this k constraint problem is, up to the constant Yli $i c i 



P - ^ 



/ - min / (t^iS) />w-E^y)f§) /(ywU 

ix /(•|x)e7'(/(-|x)) Jy I \/(y| x )/ V V#( x )/ / 



/(x) 




g(x)dx. 



By Theorem rt2l, the inner minimization has the solution /^ „(y|x). Now taking <5 = A, it 
follows from Assumption (lib that /^ „(x, y) = f^ „(y|x)g(x) is the solution to 4 (/3). D 

Proof of Proposition 2c In view of Assumption (pb, we note that 1 + ^g(x) > for 
all x > if A > 0. By Theorem pb, the probability distribution minimizing the polynomial- 
divergence (with /3 = 1/n) w.r.t. / is given by: 

~ x (l + ±g(x)) n f(x) 



where 

r°° / \ \ n n / \ 

\ />• / 
fc=0 

From the constraint equation we have 



J™ (l + ^g(x)^J n f(x) dx = J2 n~ k Q E[g(XY 



E[g(X)} __ J °°g(x) (1 + $g{z)) n f{x)dx _ £Lo^ fc ©*%PO fe+1 ]A* 



£fo(X)] c%(*)] Yl=,n- k (t)E[g{X)]E[g{X)^ 



Since, E[g(X) n+1 ] > aE[g(X)]E[g(X) n ], the n-th degree term in (14) is strictly positive and 
the constant term is negative so there exists a positive A that solves this equation. Uniqueness 
of the solution now follows from Theorem HI □. 

Proof of Proposition [4} By Theorem [4) 

/(x,y) =#(x) x /(y|x) 

where 

j eAV /(y , x) 



/e A "/(y|x)dy 

Here the superscript t corresponds to the transpose. Now /(y|x) is the A;-variate normal density 
with mean vector: 

My|x = My + ^yxS xx (x — /i, x ) 
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and the variance-covariance matrix: 



J yl x ^yy ^yx^xx^xy 



Hence /(y|x) is the normal density with mean (fx y \ x + S y | x A) and variance-covariance 



matrix S y | x . Now the moment constraint equation (25) implies: 



y/(x,y)rfydx 



#( x ) / y/(y|x)dy ) rfx 

xeK iV-fe \Jy£R k 



/ #( X ) (/*y|x + S y|xA) rfx 

JxeM JV - fe 

/ ff(x) (Ai y + S yx S-^(x-Ai x ) + S y | x A)c/x 

JxeiR iV - fe 

^ y + S^S^^pC] - /O + £ y , x A. 



Therefore, to satisfy the moment constraint, we must take 

A = S y|x [a - /u y - £ yx £ xx (£ 9 [X] - ^ 

Putting the above value of A in (fi y \ x + E y | x A) we see that /(y|x) is the normal density 
with mean 

a + £ yx £ xx (x-£ g [X]) 

and variance-covariance matrix S y | x .D 
Proof of Theorem [8l We have 

/(y|x) = £>exp{--(y - £ y | x )*£~J.(y _ A*y|*)} 



^xy- 



lol an appropriate constant D, where ji y \ x denotes a + ( a 9( J 

Suppose that the stated assumptions hold for i = 1. Under the optimal distribution, the 
marginal density of Y\ is 

IyAvi) = I ^exp{--(y - /u y |J*£-i (y - li y \ x )}g{x)dxdy 2 ...dy k . 

J(x,y 2 , ...,3/fc) 



Now the limit in (28) is equal to 



lim / L>exp{--(y - /i J'S" (y - jj, ,J} x ——dxdy 2 ...dy k . 



Vl — >oo I f \ 

y J (x,y2,V3,---,yk) 
The term in the exponent is 
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2 — — \ - ' / %3 

where a'i = a { - ^^a xyi . 

We make the following substitutions: 



l>EMi»-«3-^H(r<)-^) 



(x,y 2 ,y3,-;Vk) i — >y' = (yi,S/2>2/3>-»yfc)> 



J/i = (z/i - oi) - z 



a 



2:3/1 






J/i = (j/i - °i) - X-^, * = 2, 3, ..., fc. 
" xx 

Assuming that a xyi = Cov(X, Y\) 7^ 0, the inverse map 

y' = (yi,y r 2,y^-^y'k) ' — > (x,y2,y 3 ,-,vk) 

is given by: 

&XX I I I \ 

S/i = l/i + a- + -^{yi-y[ -a[),i = 2,3, ...,fc, 

.,, -1 t • |, , , d(x,y 2 ,y3,-,Vk) \ 1 ^ 
with Jacobian: det 



Wi,J/2.S/3.-.J/fc) 



CT 



-'-.'/1 



The integrand becomes: 



1 , j v'l^d/i-yi-ai)) 1 o- 



2 J y\* J j I 0(3/! 



<T 



xyi 



By assumption, 



< /i(j/x) for all yi, 



5(2/1) 

for some non-negative function /i(-) such that Eh(Z) < 00 when Z has a Gaussian distribution. 
We therefore have, by dominated convergence theorem 

r 1 , (g(f^(yi-y'i-<)) 

lim /Dexp{-yS-y} V ^ ? , ^V^rfy' 



<'|-^- ./ 2 yl* j g(y 1 ) I ct^ 



1 .,_ , „ lim i' gfe(yi- ^i- fl i)) 

3/i^oo I g( yi 



Dexp{--y' ( S- | 1 y'} lim ^ -^ — '- \ ^dy' 



1 



5 fefoi - !/i - «i)) 1 r0 (yi - y [ - a \) \ a. 



^exp{-^y' i S- | y} lim I -^J — J- ) lim <j ^ /\ ^ \ —dy>, 



2 xl* yi ^oo J g( yi -y[- a[) j 3/1^00 \ ^(y 

which, by our assumption on g, in turn equals 

= /Dexpi- Vs^y'} x (^V x xl x ^rfy' = f^-Y * . D 

J ^ V ^iEX / ""xj/i \ ®xx J 
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Figure 1: Solution range as a function of correlation and n. 
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